notebook file
Code Librarian: A Software Package Recommendation System
Tao, Lili, Cazan, Alexandru-Petre, Ibraimoski, Senad, Moran, Sean
The use of packaged libraries can significantly shorten the software development cycle by improving the quality and readability of code. In this paper, we present a recommendation engine called Librarian for open source libraries. A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code. We apply the state-of-the-art CodeBERT-based model for analysing the context of the source code to deliver relevant library recommendations to users.
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.97)
- Information Technology > Artificial Intelligence > Machine Learning (0.96)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.85)
AWS Sagemaker Workflow Management with Airflow
In this article, I will talk about my experience on scheduling data science project's notebooks on AWS Sagemaker instances using Airflow. We have been using Netflix's papermill library to run Jupyter notebooks more than 2 years now in production and everyday 10s of Sagemaker Notebook instances are orchestrated by Airflow working like a charm. You will read about the general architectural design of this system, what is the way of working, what are the roles and responsibilities between teams and how you can implement it yourself. It all started with me reading this article on Netflix blog about running jupyter notebook files with external parameters for productionizing data science workloads. This could be the solution to a common problem which I faced in my previous company, we were running Apache Spark applications using pyspark and other python code for data science and reporting projects on AWS EMR.